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Abstract 

Speech perception is beyond the auditory domain and a multimodal process, specifically, an auditory-visual one 
- we process lip and face movements during speech. In this paper, the findings in cross-language studies of 
auditory-visual speech perception in the past two decades are interpreted to the applied domain of second 
language (L2) instruction. This issue had previously been presented in terms of general applications of auditory- 
visual speech perception research to L2 acquisition and instruction. The focus of this paper is shifted towards 
Turkish as an L2, a language with unique morphophonemic and phonotactical characteristics that call for 
specifically designed methods of instruction. Further to this, Turkish is a language whose popularity has been 
growing ever since due to recent migratory movements and economic factors. While the need for the study of 
Turkish as an L2 is elucidated here form an experimental psychology point of view, the necessity of that scrutiny 
is highlighted from an applied and instructional stance as well. 

© 2017JLLS and the Authors - Published by ILLS. 
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1. Introduction 

When one refers to “speech perception” what is understood most of the time is a process or 
phenomenon that entirely takes place in the auditory domain. However, speech perception is not just 
an auditory process, but a multimodal or at least an auditory-visual phenomenon. Here, what is meant 
by the “visual speech” is the lip and face movements during speech. Until the discovery of the 
McGurk Effect, we actually had known empirically about the role of visual speech information in 
noisy listening conditions such that just getting visual speech information can enhance the perceived 
speech clarity by up to 20 dB (Sumby& Pollack, 1954). What McGurk Effect, an illusory experience, 
demonstrated was that we make use of visual speech information in clear listening conditions 
(McGurk& MacDonald, 1976). In a typical case of this illusory effect, when an auditory input (e.g. 
/ba/) is coupled with a conflicting visual input (e.g. /ga/), perceivers typically report a percept that is 
different to the actual physical input stimuli (e.g. /tha/ or /da/ in most English speakers). This effect 
has also been shown in the contexts of word and sentence (e.g. Sams, Manninen, Surakka&Katto, 
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1998) and has been widely used as a visual speech metric wherein visually influenced or based 
responses are deemed as the extent of visual speech influence (for alternative measure of auditory- 
visual speech perception see Jerger, Damian, Tye-Murray &Abdi, 2014). 

The focus of this article is the application of auditory-visual speech perception research as an 
experimental psychology enterprise to the domain of second language instruction. In fact, in a broader 
sense and scope, this theme had been presented elsewhere recently (Erdener, 2016); however, the 
specific focus here is Turkish instruction as a foreign language (L2, hereafter). In this respect, the 
remainder of this article will, following a logical order, present the overall literature on cross-language 
issues in the area of auditory-visual speech perception, and from there will extend these first to general 
issues that pertain to L2 instruction and then move onto language-specific issues and hypotheses in the 
context of Turkish as an L2. 

Cross-language studies of auditory-visual speech perception, overall, yields two patterns of 
outcomes. First, the extent to which visual speech information is used in one’s native language (LI, 
hereafter) varies from language to language. This variation occurs over a number of phonotactical and 
language-specific factors. The most salient of these factors are exemplified in the studies cited in the 
forthcoming parts. Second, very consistently, cross-language studies of auditory-visual speech 
perception have so far demonstrated that the visual speech influence is almost always greater while 
attending to L2 than for LI. In a series of studies using Mandarin, Japanese and American English 
stimuli and participants, Sekiyama and her colleagues demonstrated the relative robustness of visual 
speech information in these languages. In one of these studies comparing English and Japanese 
monolinguals, Sekiyama and Tohkura (1993) found that English speakers were a lot more prone to 
McGurk effect than their Japanese speaking counterparts. Later on Sekiyama (1997) found that 
Japanese speakers were relatively more susceptible to the effect than their Mandarin-speaking 
counterparts. Sekiyama stresses that there may, in relative terms, may be less need to integrate the 
visual speech information in some Lis. For instance, Japanese has much less visually discernible 
consonants and vowels than English in terms of the actual numbers of these linguistic elements. 
Furthermore, Mandarin, one of the target languages in the Sekiyama studies, is a tonal language that 
features six lexical tones which, presumably, are not visually discernible (Burnham et al., 2000). 
Beyond this, the comparison of Japanese and English perception of visual speech at the cortical level 
using an fMRI method revealed that English speakers, who are more prone to McGurk illusion, 
displayed more neural connectivity between the visual motion area and Heschl’sgyrus (left side of 
superior temporal sulcus which associated with the integration of auditory and visual speech 
information) than their Japanese counterparts (Shinozaki, Hiroe, Nagamine&Sekiyama, 2016). In a 
recent study, Asiaee and her colleagues compared a group of Persian and a group of Kurdish speakers 
over a series of McGurk effect. They found that, relative to English, both groups’ responses revealed 
relatively shrunk visually influenced responses. However, the likelihood of the Kurdish perceivers was 
greater compared to their Persian speaking counterparts. Although the authors do not delve into the 
details of why this is the case in the English language summary of their report, most likely the Kurdish 
speakers were bilinguals in both Persian and Kurdish as opposed to the native Persian speaking group 
(Asiaee, Kivanani, &Nourbakhsh, 2016). Back to the Sekiyama et al work, what was found thereof 
and in other auditory-visual speech perception studies is that when subjects attend to an L2 input, 
compared to an LI input, their responses yield almost always more visually influenced responses - 
what Sekiyama referred to as foreign speaker effect. This effect was observed in a plethora of 
languages as diverse as Mandarin (Chen &Hazan, 2009), Korean (Kim & Davis, 2001), Dutch, 
German (Reisberg, McLean & Goldfield 1987), and Spanish (Ortega-Llebaria, Faulkner &Hazan 
2001). In a study with three speaker groups, namely, Korean, Mandarin and English, the participants 
were presented with stimuli (all of which were English-native) featuring labiodentals (e.g. /f/ as in 
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flight, non-Korean), interdentals (e.g. /0/, as in thick, non-Korean and non-Mandarin) and alveolars (/s/ 
as in still) in auditory-visual, auditory-only and visual-only listening conditions. Results revealed that 
for labiodentals both Korean and Mandarin speakers performed at native levels due to their visual 
discernibility whereas their performance for interdentals and alveolar and interdentals were poorer 
than their English-speaking counterparts (Wang, Behne& Jiang, 2009). What both foreign speaker 
effect and visual discernibility of target phonemes to be learnt tell us is that exploitation of additional 
perceptual sources, in particular visual speech information, allows for a clearer perception and 
production (Erdener& Burnham, 2005). Such findings are of paramount importance because these are 
the type of experimental results that have the potential to be translated into applied domains. For 
instance, the finding that visual discernibility of phonemes pave the way for a clearer (and more 
accurate) perception and production in an experimental setting means this experimental result of high 
applicability can be installed into systems of foreign language instruction. One bridging attempt in this 
domain was a study by Erdener and Burnham (2005) in which they tested monolingual speakers of 
Australian English and Turkish over a series of stimuli in the following experimental conditions: 
auditory-only (subjects only heard the target stimuli), auditory-visual (subjects heard the stimuli plus 
saw the face of the talker as they spoke), auditory-visual-orthographic (auditory-visual plus the 
stimulus was presented in written form as well) and auditory-orthographic (stimuli presented in 
auditory and written form). The task for the subjects was to attend to each stimulus and repeat it to the 
best of their ability. The dependent variable was the number of errors committed as perceived by two 
native speakers of Irish and Spanish who were blind to the experimental conditions under which the 
recordings were made. In short, there were two main results in this study. First, as it was the case in 
previous (and following) studies that when visual information was present, both perception and 
production was superior than when it was not available (Wang et al., 2009). This is usually the case 
when the speech input is hyperarticulated (Lees & Burnham, 2008; also see the concept “teacherese” 
for the use of hyperarticulation in the classroom, Hakansson, 1987) Second, this study revealed that 
irrespective of the presence of visual information, Turkish subjects appeared to rely on orthographic 
information more than their Australian counterparts whenever it was available (auditory-visual- 
orthographic and auditory-orthographic conditions). This indeed was a good strategy whenever the LI 
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Figure 1. The collapsed mean correct ratings by native speakers for Spanish and Irish stimulus 
productions by Turkish and Australian participants in the orthographic experimental conditions (data 
adapted from Erdener& Burnham, 2005 and used in Erdener, 2016). 
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orthography (Turkish) was coherent with the presumed L2 orthography and transparent (Spanish), 
whereas it was not when the target orthography was incoherent and opaque (Irish). On the contrary, 
having an awareness of and experience with an (infamously) incoherent orthography, Australian 
monolinguals seemed to have ignored the orthographic input and relied on auditory and/or visual 
information (Figure 1). 

1.1. Literature review 

So far, a snapshot of the auditory-visual speech perception in the context of cross-language 
investigations has been presented. The main aim of this paper is to link this predominantly 
experimental psychological enterprise to an applied setting: L2 instruction. Recently, Erdener (2016) 
has outlined the main avenues through which research in auditory-visual speech perception can be 
utilised in the context of L2 instruction. Broadly speaking, this paper aims the same issue; however, 
specifically speaking, the focus of interest here is the rather neglected area of Turkish as an L2. Even 
more specifically, the rest of the paper will deal with Turkish instruction as an L2 in relation to 
auditory-visual speech perception research. The amount of auditory-visual speech perception research 
is extremely limited to less than a handful of studies. Therefore the paper will first present these small 
number of studies, then present the current methods of Turkish instruction as an L2 and propose two 
lines of research and hypotheses: (a) more auditory-visual speech perception research focusing on 
language-specific features of Turkish such as its relatively complex morphology; (b) methods of 
instruction beyond auditory-only channels - namely how auditory-visual speech perception research 
can and should be exploited to teach Turkish whose non-native learners will surely increase as part of 
the global and economic migratory movements and the high number of refugees expressed in millions 
from the Middle East who took shelter permanently or long-term in both Turkey and Cyprus. Next, 
attention will be paid to the language-specific features of Turkish whose instruction may be enhanced 
by means of auditory-visual methods rather than auditory-only methods. 

1.2. Turkish as a neglected language in auditory-visual speech perception research 

Auditory-visual speech perception is a multidisciplinary area of research attracting investigators 
from an array of fields including but not limited to psychology, engineering linguistics, and so on. 
Given the attention given to auditory-visual speech perception and processing matters, there is an 
ever-growing body of research in this domain. One of the vantage points of the growing research 
enterprise is of psycholinguistic in nature, focusing on cross-language issues. As summed up in the 
preceding section, generally speaking, the cross-language research in auditory-visual speech 
perception reveals that: (a) when we attend to an unfamiliar language, we make use of visual speech 
information more than when we are exposed to our LI, (b) the degree to which visual information is 
used is a function experience/proficiency in a given L2. While several, mostly European, languages 
have so far been studied, including some non-European languages such as Japanese and Mandarin, 
there is still a significant paucity of research in auditory-visual speech perception with other 
languages. While some Middle Eastern languages have started to emerge as target languages such as 
Kurdish and Farsi (Asiaee et al., 2016) and Arabic (Ali, Hassan-Haj, Ingleby&Idrissi, 2005) more 
efforts are surely needed towards that avenue given the widespread use of these languages both in the 
region and the world. Turkish is one such language with its unique characteristics that call for scrutiny 
in the context of auditory-visual speech perception research. 

As presented in the section above, the amount of McGurk effect experience by native speakers of a 
given LI varies as a function of phonological and/or phonotactical requirements of that language. As 
an understudied language in auditory-visual speech perception research, Turkish has attracted only a 
couple of studies in this domain. First of these was presented above in which Turkish-speaking 
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participants’ speech perception was overwhelmingly influenced by the presence of orthographic 
information (Erdener& Burnham, 2005). However, as it was the case in most other languages, Turkish 
perceivers’ productions of target stimuli were much better and accurate when they were presented in 
auditory + visual information format than when presented in auditory-only format. Yet this was not a 
McGurk study per se. Recently, Erdener (2015) conducted a preliminary experiment in which he 
presented a group of native Turkish speakers with McGurk type coherent and incoherent auditory- 
visual speech stimuli as well as visual-only (i.e., lipreading type) stimuli prepared from English and 
Turkish material. Results suggested that for both target stimulus languages, the Turkish perceivers 
performed beyond chance levels that they made use of visual speech information thus were prone to 
the McGurk effect. An inter-language comparison of visual-only data revealed that Turkish speakers 
were much better at lipreading in Turkish than in English. In summary, Turkish, along with languages 
such as English (McGurk& MacDonald, 1976), Italian (Bovo, Ciorba, Prosser & Martini, 2009), 
Korean (Kim & Davis, 2001), German (Reisberg, et al., 1987), Spanish (Ortega-Llebaria, et al., 2001) 
and Kurdish (Asiaee et al., 2016), is a language whose speakers seem to use visual speech information 
comparable to these (and more) languages. This, from an applied perspective - and application of pure 
science data should be the ultimate goal as per one of the aims of science - demonstrates that Turkish 
is a language whose instruction can utilise from the use of visual speech stimuli. One question here 
remains though: what aspects of Turkish do specifically necessitate the use of visual speech 
information? Every language has its relatively challenging aspects. For instance, for speakers of non- 
tonal languages, such as English, acquisition of lexical tones is perplexing. In the case of Turkish, 
notoriously, its agglutinated morphological structure is one single aspect whose acquisition can be a 
significant hurdle. In the next section, we turn our attention to Turkish as an L2 in terms of its both 
challenges and methods of instruction before proceeding towards how experimental psychological 
data may come in handy via new hypotheses as per those practical challenges. 

1.3. Turkish as an L2 

1.3.1. Teaching Turkish as L2: current materials of instruction 

Similar to their counterparts for teaching English, materials that are currently in use to teach 
Turkish as an L2 mostly in conventional formats such as printed books and audio material. To the best 
of the author’s knowledge there is no any auditory-visual material used in teaching Turkish. Most 
material available in circulation and classrooms focus on grammar (Hengirmen, 2001; Arslan, 2011; 
Ketrez, 2012), or a combination of grammar, photographic and audio material (TOMER, 2012). In the 
light of current research available, inclusion of dynamic, video material allows for better perception 
and production (e.g., Erdener& Burnham, 2005) of the material to be learnt. 

Given the paucity of auditory-visual teaching material as well as the very limited amount of 
research on Turkish in the context of auditory-visual speech perception, it is worth thinking about and 
noting what special aspects of Turkish is worthwhile to study. One of the most difficult aspects of 
Turkish for foreign learners is its quite complex morphological structure thus rules of agglutination. 
The next section focuses on what benefits we can obtain from studying Turkish and its unique aspects 
in the auditory-visual speech perception domain and, in turn, how findings of this research can be 
implemented into the teaching methods of Turkish as a foreign language. 

1.3.2. Extrapolating from auditory-visual speech research for Turkish as L2 

There are only a couple of- and preliminary - studies on Turkish in the context of auditory-visual 
speech perception. One recent study demonstrated that McGurk effect is a salient feature in this 
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language, such that native speakers of this language process visual speech information greater than 
chance levels (Erdener, 2016). The other one is the aforementioned work by Erdener and Burnham 
(2005) whereby they showed that inclusion of visual speech information enhances pronunciation in a 
target language as well as in target languages with transparent orthographies. Beyond this there has 
been no auditory-visual speech perception study of Turkish language. However, Turkish has some 
unique characteristics that surely call for the scrutiny of the extent to which visual speech information 
may be useful in the context of acquisition of Turkish as an L2. One such characteristic is its complex 
morphological structure. 

Previous studies as cited above, in most cases, presents clear-cut evidence for beneficial effects for 
both perception (e.g., Wang et al., 2009) and production (e.g., Erdener& Burnham, 2005) of target 
stimuli. Although the facilitative effect of the use of visual material in L2 teaching has been broadly 
expressed before ((,'akir, 2006) has been expressed before, especially for Turkish, no relatively 
sophisticated hypotheses and possible applied implications of auditory-visual speech perception has 
ever been advanced. The next section aims to do just that. 

1.4. Turkish morphology and auditory-visual speech perception 

1.4.1. Turkish morphophonemic structure 

So far, we know only two things about the behaviour of native speakers of Turkish in the context of 
auditory-visual speech perception that (a) the native speakers of Turkish are prone to the McGurk 
illusion in their native language thus they make use of visual speech information, when available or 
needed (Erdener, 2016); and (b) they are bound and influenced by the orthographic information - and 
irrespective of how coherent or incoherent the phoneme-grapheme correspondences are in a given 
target language - when it is available in response to foreign language input and this occurs more when 
visual speech information is available - in fact with or without the orthography thus another 
behavioural, non-McGurk evidence (Erdener& Burnham, 2005). Although this is a very limited 
amount of information as sketched above, we absolutely know nothing about how learners of Turkish 
as a foreign language behave in response to Turkish language stimuli. In this respect, we are proposing 
an experimental roadmap to study the peculiar and difficult aspects of Turkish as an L2 which may be 
easier to learn and, pf course, teach once we understand the auditory-visual aspects of those 
peculiarities. Although there have been attempts of developing teaching 

One salient aspect of Turkish is its agglutination. In Turkish, the word structures are formed by 
productive affixations of derivational and inflectional morphemes to root words (Oflazer, 
G6qmen&Boz§ahin, 2014). The morphophonemic rules in Turkish are very complex; however, they 
are all based on simple principles such as vowel harmony in which vowels in the roots can be deleted, 
modified or can be subject to exceptions as is the case with a large body of borrowed vocabulary 
particularly from languages such as Arabic and Farsi (Oflazer et al., 2014). The morphophonemic 
rules of Turkish, albeit based on a small number of principles, allow for the construction of very 
complex lexical structures (Oflazer et al., 2014) such as “Britanyahhla§tiramadiklanmizdanmi§smiz”. 
In this rather marginal example, there are eleven morphemes, Britanya-h-la§-tir-ama-dik-lar-imiz-dan- 
mi§-smiz, approximately meaning “It is said that you are one of those whom we could not covert into 
a British”. 

Generally speaking, the morphophonemic rules are instantiated in the construction of lexical items 
via two ways: nominal paradigm and verb paradigm (Oflazer et al, 2014). While in nominal paradigm, 
the rules indicate how inflectional suffixes are applied to nouns and adjectives, in verb paradigm, how 
those suffixes are appended to the verbs are organised. 
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Although the underlying principles are simple, anecdotal evidence from teachers of Turkish as a 
foreign language suggests that the acquisition of these morphophonemic rules are quite challenging for 
the foreign learners of Turkish (Tayfun Can Onuk, personal communication, September 26, 2016). 
Therefore, investigating the ways of improving teaching methods of this rather peculiar and difficult - 
to-acquire aspect of Turkish as an L2 is a must from both basic and applied science points of view. In 
the final section of this paper, I will attempt to present an experimental method through which one can 
identify (a) the extent to which providing visual speech information may enhance the acquisition of 
the complex morphophonemic rules of Turkish, (b) the factors which may be at work in both 
perception and production of Turkish morphological stimuli that vary from simple to complex, and (c) 
how possible findings from this type of experimental work may be implemented into the creation of 
new instructional material. 

1.4.2. A framework for the study of Turkish morphophonemic structure in the context of auditory-visual speech 
perception 

As argued above, the study of morphophonemic structure of Turkish as an L2 is mandatory given 
that this is one language whose popularity and the number of non-native speakers are on the rise. 
Given the applied (i.e., instructional) concerns of the ideas expressed here that emanate from 
experimental work, a method that adopts both perception and production as dependent variables 
(measured with both quantitative and qualitative methods) seems to be the way to go. In that respect, a 
proposed work should look consider a series of relevant variables that were presented above, as well. 
First things first, given the intricacy of Turkish morphology, one must consider as target stimuli from 
Turkish with varying levels of morphophonemic complexity. Secondly, as per the description of the 
way Turkish morphophonemic rules are instantiated, nominal and verbal paradigms need to be looked 
at, too, in order to see whether there are any differences between these two types of inflections. 
Thirdly, the effect of talker needs to be considered. It is usually assumed that female speakers are 
clearer talkers than male speakers. Another variable is the extent to which talkers pronounce target 
stimuli for learners in a clear way or rather in a way that is addressed to native speakers - hypo- vs 
hyper-articulated speech. Usually, teachers in classrooms tend to implicitly employ hyper-articulated 
speech (Hakansson, 1987), which was found to yield better perception of target stimuli (Lees & 
Burnham, 2005). Lastly, and naturally, the effect of provision of both visual speech information in the 
form of orofacial movements and orthographic information (Erdener& Burnham, 2005) should be 
considered. Both visual speech and orthographic information were found to be useful under certain 
conditions and especially in the case of Turkish one would predict that provision of a transparent 
orthography should allow for a better perception, and hopefully a better production of 
morphophonemically complex Turkish vocabulary by non-native learners. 


2. Conclusion 

In summary, the paucity of research on Turkish in the context of auditory-visual speech perception 
is, in itself, a spur to further exploration of a language that has a distinct set of features different to 
Western languages and most other languages so far studied. In turn, this would pave the way to the 
discovery of creative ways of using visual speech information in language instruction settings and 
most likely, discovery of new methods of both research and instruction. 
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Turkgeninyabancidilolarakogretilmesi: deneyselpsikolojidengikanmlar 


Oz 

Konusjina algisi siireci, uzun siire boyunca dii§iiniilenin aksine, esasinda sadece i^itsel boyutta oilman bir sure? 
degildir. f'oklu boyutta bir siire? olmakla beraber, i^itsel-gorsei bir siire? oldugu soylenebilir. Gorsellikten 
kastedilen ise i§itilen uyaranm yamsira yiiz ve dudak hareketleri ile de konu^ma algisi siirecine dahil edilmesidir. 
Bu makalede i^itsel-gorsel konu^ma algisi alanyazinmin onemli bir kismim olu^turan diller arasi ?ali§malan, 
ikinci dil egitimi i?in uygulanabilirligi a?ismdan yorumlanmaktadir. Bu konu yakin bir zamanda son yirmi yildir 
geli^mekte olan i^itsel-gorsel konu^ma algisi alanyazinmin genel dil egitimindeki olasi uygulamalarina 
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ilifjkilendirilmesi boyutunda ele almmifjti (Erdener, 2016). Bu makalenin odak noktasi ise, bu noktanm biraz 
daha otesinde, kendine ozgii yapisi ve fonotaktik ozellikleri ile ve aym zamanda son zamanlardaki goy dalgalari 
ve ekonomik nedenlerle ogrenen sayisinin artmasi beklenen Tiirkyenin ikinci dil olarak ogretilmesinde bu 
literaturiin kullammidir. Tiirkyenin ikinci dil olarak ediniminin deneysel ayidan nedenleri irdelenirken, ote 
yandan bunun neden gerekli oldugu uygulama ve ogretim noktasinda ayiklanmaktadir. 

Anahtar sozciikler. i§itsel-gorsel konu^ma algisi; L2; ikinci dil edinimi; dil psikolojisi, Turk ye 
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